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Abstract 

We improve the existing achievable rate regions for causal and for zero-delay source coding of 
stationary Gaussian sources under an average mean squared error (MSE) distortion measure. To begin 
with, we find a closed-form expression for the information-theoretic causal rate-distortion function (RDF) 
under such distortion measure, denoted by R l *(D), for first-order Gauss-Markov processes. R^(D) 
is a lower bound to the optimal performance theoretically attainable (OPTA) by any causal source 
code, namely R° P (D). We show that, for Gaussian sources, the latter can also be upper bounded as 
R° P (D) < Rf(D) + 0.5 log 2 (27r c) bits/sample. In order to analyze R % *{D) for arbitrary zero-mean 
Gaussian stationary sources, we introduce R l *(D), the information-theoretic causal RDF when the 
reconstruction error is jointly stationary with the source. Based upon i?**(_D), we derive three closed-form 
upper bounds to the additive rate loss defined as R l ^(D) — R(D), where R(D) denotes Shannon's RDF. 
Two of these bounds are strictly smaller than 0.5 bits/sample at all rates. These bounds differ from one 
another in their tightness and ease of evaluation; the tighter the bound, the more involved its evaluation. 
We then show that, for any source spectral density and any positive distortion D < a\, R l *(D) can be 
realized by an AWGN channel surrounded by a unique set of causal pre-, post-, and feedback filters. We 
show that finding such filters constitutes a convex optimization problem. In order to solve the latter, we 
propose an iterative optimization procedure that yields the optimal filters and is guaranteed to converge 
to R l *(D). Finally, by establishing a connection to feedback quantization we design a causal and a zero- 
delay coding scheme which, for Gaussian sources, achieves an operational rate lower than i?**(Z?) + 0.254 
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and Rf(D) + 0.754 bits/sample, respectively. This implies that the OPTA among all zero-delay source 
codes, denoted by R° z p d (D), is upper bounded as R° p d (D) < + 1.254 < R(D) + 1.754 bits/sample. 



Index Terms 

Causality, rate-distortion theory, entropy coded dithered quantization, noise-shaping, differential pulse- 
code modulation (DPCM), sequential coding, convex optimization. 

I. Introduction 

In zero-delay source coding, the reconstruction of each input sample must take place at the same time 
instant the corresponding input sample has been encoded. Zero-delay source coding is desirable in many 
applications, e.g., in real-time applications where one cannot afford to have large delays 12, or in systems 
involving feedback, in which the current input depends on the previous outputs @-(4|]. A weaker notion 
closely related to the principle behind zero-delay codes is that of causal source coding, wherein the 
reproduction of the present source sample depends only on the present and past source samples but not 
on the future source samples (5), Hjj. This notion does not preclude the use of non-causal entropy coding, 
and thus it does not guarantee zero-delay reconstruction. Nevertheless, any zero-delay source code must 
also be causal. 

It is known that, in general, causal codes cannot achieve the rate-distortion function (RDF) R(D) of 
the source, which is the optimal performance theoretically attainable (OPTA) in the absence of causality 
constraints Q. However, it is in general not known how close to R(D) one can get when restricting 
attention to the class of causal or zero-delay source codes, except, for causal codes, when dealing with 
memory-less sources Q, stationary sources at high resolution (61, or first-order Gauss-Markov sources 
under a per-sample MSE distortion metric O. 

For the case of memory-less sources, it was shown by Neuhoff and Gilbert that the optimum rate- 
distortion performance of causal source codes is achieved by time-sharing at most two memory-less scalar 
quantizers (followed by entropy coders) 0. In this case, the rate loss due to causality was shown to 
be given by the space-filling loss of the quantizers, i.e. the loss is at most (1/2) ln(27re/12) (~ 0.254) 
bits/sample. For the case of Gaussian stationary sources with memory and MSE distortion, Gorbunov and 
Pinsker showed that the information theoretiqj causal RDF, here denoted by R^(D) and to be defined 

'Here and in the sequel, the term "information theoretic" refers to the use of mutual information as a measure of the rate. 
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formally in Section [TTJ tends to Shannon's RDF as the distortion goes to zero (H, O- The possible gap 
between the OPTA of causal source codes and this information-theoretic causal RDF was not assessed. On 
the other hand, for arbitrary stationary sources with finite differential entropy and under high-resolution 
conditions, it was shown in @ that the rate-loss of causal codes (i.e, the difference between their OPTA 
and Shannon's RDF) is at most the space-filling loss of a uniform scalar quantizer. With the exception 
of memory-less sources and first-order Gauss-Markov sources, the "price" of causality at general rate 
regimes for other stationary sources remains an open problem. However, it is known that for any source, 
the mutual information rates across an additive white Gaussian noise (AWGN) channel and across a scalar 
ECDQ channel do not exceed R{D) by more than 0.5 and 0.754 bits per sample, respectively |[T0l . ifTTl . 
This immediately yields the bounds R*{D) < R(D) + 0.5 and R° P (D) < R(D) + 0.754. 

In causal source coding it is generally difficult to provide a constructive proof of achievability since 
Shannon's random codebook construction, which relies upon jointly encoding long sequences of source 
symbols, is not directly applicable even in the case of memory-less sources. Thus, even if one could 
obtain an outer bound for the achievable region based on an information theoretic RDF, finding the inner 
bound, i.e., the OPTA, would still remain a challenge. 

There exist other results related to the information-theoretic causal RDF, in which achievability is not 
addressed. The minimum sum rate necessary to sequentially encode and decode two scalar correlated 
random variables under a coupled fidelity criterion was studied in lfl~2ll . A closed-form expression for this 
minimum rate is given in |[T2l Theorem 4] for the special case of a squared error distortion measure and a 
per- variable (as opposed to a sum or average) distortion constraint. In 0, the minimum rate for causally 
encoding and decoding source samples (under per-sample or average distortion constraints) was given 
the name sequential rate-distortion function (SRDF). Under a per-sample MSE distortion constraint D, 
it was also shown in (2j p. 187] that for a first-order Gauss-Markov source x(k + 1) = a\x(k) + £(k\, 
where {£(&)} is a zero-mean white Gaussian process with variance er|, the information theoretic SRDH_ 
R l g RD (D) takes the form 



Rsrd(D) = min i , - log 2 I fl ? + -JL U bits/sample, (1) 



for all D > 00 No expressions are known for Rgju)(D) for higher-order Gauss-Markov sources. Also, 
with the exception of memory-less Gaussian sources, R l ^{D), with its average MSE distortion constraint 

The information theoretic SRDF is the one defined in (2] Def. 5.3.1], where it is denoted by R^(D). 
3 It has not been established whether ([TJ is achievable or how close one can get to it. 
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(weaker than a per-sample MSE constraint), has not been characterized. 

In this paper, we improve the existing inner and outer rate-distortion bounds for causal and for zero- 
delay source coding of zero-mean Gaussian stationary sources and average MSE distortion. We start 
by showing that, for any zero-mean Gaussian source with bounded differential entropy rate, the causal 
OPTA exceeds R l *(D) by less than approximately 0.254 bits/sample. Then we revisit the SRDF problem 
for first-order Gauss-Markov sources under a per-sample distortion constraint schedule and find the 
explicit expression for the corresponding RDF by means of an alternative, constructive derivation. This 
expression, which turns out to differ from the one found in fl2j bottom of p. 186], allows us to show that 
for first-order Gauss-Markov sources, the information-theoretic causal RDF R^(D) for an average (as 
opposed to per-sample) distortion measure coincides with (Q]). In order to upper bound R l ^{D) for general 
Gaussian stationary sources, we introduce the information-theoretic causal RDF when the distortion is 
jointly stationary with the source and denote it by R l ^{D). We then derive three closed-form upper 
bounding functions to the rate-loss R^{D) — R{D), which can be applied to any stationary Gaussian 
random process. Two of these bounds are, at all rates, strictly tighter than the best previously known 
general bound of 0.5 bits/sample. Since, by definition, R l ^{D) < R l ^{D), we have that 

(a) 

R l *(D) - R(D) < R$(D) - R(D), (2) 

and thus all four three bounding functions also upper bound the gap R l ^{D) — R{D). As we shall see, 
equality holds in (a) if R^(D) could be realized by a test channel with distortion jointly stationary with 
the source, which seems a reasonable conjecture for stationary sources. 

We do not provide a closed-form expression for R t ( f(D) (except for first-order Gauss-Markov sources), 
and thus the upper bound on the right-hand-side (RHS) of Q (the tightest bound discussed in this 
paper) is not evaluated analytically for the general case. However, we propose an iterative procedure 
that can be implemented numerically and which allows one to evaluate R l ^(D), for any source power 
spectral density (PSD) and D > 0, with any desired accuracy. This procedure is based upon the iterative 
optimization of causal pre-, post- and feedback-filters around an AWGN channel. A key result in this 
paper (and its second main contribution) is showing that such filter optimization problem is convex 
in the frequency responses of all the filters. This guarantees that the mutual information rate between 
source and reconstruction yielded by our iterative procedure converges monotonically to Rf(D) as the 
number of iterations and the order of the filters tend to infinity. This equivalence between the solution 
to a convex filter design optimization problem and Rf{D) avoids the troublesome minimization over 
mutual informations, thus making it possible to actually compute Rf(D) in practice, for general Gaussian 
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stationary sources. We then make the link between Rf(D) and the OPTA of causal and zero-delay 
codes. More precisely, when the AWGN channel is replaced by a subtractively dithered uniform scalar 
quantizer followed by memory-less entropy coding, the filters obtained with the iterative procedure yield 
a causal source coding system whose operational rate is below Rf(D) + (1/2) log 2 (27re) bits/sample. 
If the entropy coder in this system is restricted to encode quantized values individually (as opposed to 
long sequences of them), then this system achieves zero-delay operation with an operational rate below 
Rf(D) + (1/2) log 2 (27re) + 1 bits/sample. This directly translates into an upper bound to the OPTA of 
zero-delay source codes, namely R'ZiD). To illustrate our results, we present an example for a zero-mean 
AR-1 and a zero-mean AR-2 Gaussian source, for which we evaluate the closed-form bounds and obtain 
an approximation of Rf(D) numerically by applying the iterative procedure proposed herein. 

This paper is organized as follows: In Section [H] we review some preliminary notions. We prove 
in section In] that the OPTA for Gaussian sources does not exceed the information-theoretic RDF by 
more than approximately 0.254 bits per sample. Section [TV] contains the derivation of a closed-form 
expression for R^(D) for first-order Gauss-Markov sources. In Section [V] we formally introduce Rf(D) 
and derive the three closed-form upper bounding functions for the information-theoretic rate-loss of 
causality. Section [Vl] presents the iterative procedure to calculate R l ( f(D), after presenting the proof 
of convexity that guarantees its convergence. The two examples are provided in Section IVIII Finally, 
Section IVIIII draws conclusions. (Most of the proofs of our results are given in sections [IX] to IXVI ) 

Notation 

M. and Mq" denote, respectively, the set of real numbers and the set of non-negative real numbers. 
Z and Z + denote, respectively, the sets of integers and positive integers. We use non-italic lower 
case letters, such as x, to denote scalar random variables, and boldface lower-case and upper-case 
letters to denote vectors and matrices, respectively. We use A\ span{A} and Af{A} to denote the 
Moore-Penrose pseudo-inverse, the column span and the null space of the matrix A, respectively. The 
expectation operator is denoted by E[ ]. The notation a\ refers to the variance of x. The notation {x(/c)}^ =1 
describes a one-sided random process, which may also be written simply as {x(fc)}. We write x fc to 
refer to the sequence {x(i)}* =1 . The PSD of a wide-sense stationary process {x(/c)} is denoted by 
S x (e? u ), uj G [-7T, it}. Notice that a 2 = ^f* n S x (e> u )du. For any two functions f,g : [-7r,7r] -)• C, 
f,g € L 2 , we write the standard squared norm and inner product as ||/|| 2 = j-f* \f(uj)\ 2 duj and 
{fid) — ^f- f{w)g(uj)*(k>j, respectively, where * denotes complex conjugation. For one-sided random 
processes {x(fc)} and {y(k)}, the term 7({x(A;)} ;{y(fc)}) = lim^oo sup ^(x^; y*) denotes the mutual 
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information rate between {x(fc)} and {y(k)}, provided the limit exists. Similarly, for a stationary random 
process {x(A;)}, h({x(k)}) = lim^oo h(x(k)\x k ) denotes the differential entropy rate of {x(fc)}. 

II. Preliminaries 

A source encoder-decoder (ED) pair encodes a source {x(/c)}^ = _ 00 into binary symbols, from which 
a reconstruction {y{k)}^ =1 °f { x (^)}fcLi * s generated. The end-to-end effect of any ED pair can be 
described by a series of reproduction functions {fk}kLv such that, for every k G Z + , 

yj = /fc(x~oo), (3) 

where we write y k as a short notation for {y{j)} k =i - Following Q, we say that an ED pair is causal if 
and only if it satisfies the following definition Q: 

Definition 1 ( Causal Source Coder): An ED pair is said to be causal if and only if its reproduction 
functions are such that 

/fc(x-oc) = /*(*-«,), whenever x^ = x*^, VA: G Z+ 

A 

It also follows from Definition Q] that an ED pair is causal if and only if the following Markov chain 
holds for every possible random input process {x(/c)}: 

xf +1 ^x k _ OQ ^y k 1 , Vfc£Z+. (4) 

It is worth noting that if the reproducing functions are random, then this equivalent causality constraint 
must require that (0]) is satisfied for each realization of the reproducing functions {fk}uLi- 

Let Lfc(x^°) be the total number of bits that the decoder has received when it generates the output 
subsequence y\. Define h(k) G {0, l} Lfc as the random binary sequence that contains the bits that the 
decoder has received when y\ is generated. Notice that is, in general, a function of all source samples, 
since the binary coding may be non-causal, i.e., y\ may be generated only after the decoder has received 
enough bits to reproduce y™, with m > k. We highlight the fact that even though h(k) may contain bits 
which depend on samples x(^) with I > k, the random sequences x^f^ and y^" may still satisfy ©, i.e., 
the ED pair can still be causal. Notice also that Lfc(xf°) is a random variable, which depends on x^^, 
the functions {/&} and on the manner in which the source is encoded into the binary sequence sent to 
the decoder. 

For further analysis, we define the average operational rate of an ED pair as 

r({x(fc)},{y(fc)}) = J™ sup^EfL^x^)] . (5) 
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In the sequel, we focus only on the MSE as the distortion measure. Accordingly, we define the average 
distortion associated with an ED pair as: 

f 



d({x(fc)},{y(fc)})4 lim sup^E 



v fc ll 2 

l x i ~~ Yi II 



(6) 



The above notions allow us to define the operational causal RDF as follows: 

Definition 2: The Operational Causal Rate-Distortion Function for a source (x(fc)} is defined as j5j: 

R°v{D)± inf r({y(fc)},{x(fc)}). (7) 

{y«}:y(fc)=A(x fc ),Vfc G z+ 

{f k } causal, 
rf({x(fc)},{y(fc)})<D. 

▲ 

We note that the operational causal rate distortion function defined above corresponds to the OPTA of 
all causal ED pairs. 

In order to find a meaningful information-theoretical counterpart of R° P (D), we note from lfT3l 
Theorem 5.3.1] that 

i E[L fc (xH] > \H(h{k)), Vk G Z+. (8) 
Also, from the Data Processing Inequality lfT3l . it follows immediately that 

H(b(k)) = i(b(k)Mk)) > ;y k i) > i(4;y k i), (9) 

where the last inequality turns into equality for a causal ED pair, since in that case (|4]) holds. Thus, 
combining ©, (HJ) and ©, 

r(Mk)} , {y(fc)» > lim sup ; y*) = J({x(fe)} ; {y(fc)». (10) 

This lower bound motivates the study of an information-theoretic causal rate distortion function, as defined 
below. 

Definition 3: The Information-Theoretic Causal Rate-Distortion Function for a source {x(k)}, with 
respect to the average MSE distortion measure, is defined as 

fl«(23)4i n fj({x(fc)};{y(fc)}), 

where the infimum is over all processes {y(k)} such that d({x(k)} ,{y(k)}) < D and such that d4j) 
holds. ▲ 
The above definition is a special case of the non-anticipative epsilon-entropy introduced by Pinsker and 
Gorbunov, which was shown to converge to Shannon's RDF, for Gaussian stationary sources and in the 
limit as the rate goes to infinity [8], 
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In the non-causal case, it is known that for any source and for any single-letter distortion measure, 
the OPTA equals the information-theoretic RDF lfl"3ll . Unfortunately, such a strong equivalence between 
the OPTA and the information-theoretic RDF does not seem to be possible in the causal case (i.e., for 
R l *{D)). (One exception is if one is to jointly and causally encode an asymptotically large number of 
parallel Gaussian sources, in which case Rf(D) can be shown to coincide with the OPTA of causal 
codes.) Nevertheless, as outlined in Section HI it is possible to obtain lower and upper bounds to the 
OPTA of causal codes from R^{D). Indeed, and to begin with, since R^(D) > R(D), it follows directly 
from © and (HO]) that 

R° p (D)>Rf{D)>R(D). (11) 

The last inequality in (fTTT) is strict, in general, and becomes equality when the source is white or when 
the rate tends to infinity. Also, as it will be shown in Section [TTTl for Gaussian sources R° P (D) does not 
exceed R^(D) by more than approximately 0.254 bits/sample, and thus an upper bound to R° P (D) can 
be obtained from R^(D). 

For completeness, and for future reference, we recall that for any MSE distortion D > 0, the RDF 
for a stationary Gaussian source with PSD 5 x (e :,aJ ) is equal to the associated information-theoretic RDF, 
given by the "reverse water-filling" equations Q 

R(D) = -L J max jo , log 2 (^f^-) } du (12a) 

— 7T 
7T 

D = ^J min {6 , S x (e juJ )}du. (12b) 

— 7T 

Although in general it is not known by how much R^(D) exceeds R(D), for Gaussian stationary 
sources one can readily find an upper bound for R l *(D) in the quadratic Gaussian RDF for source- 
uncorrelated distortion, defined as lfl"4l 



RHD)^ inf ' /({x(A0},{y(fc)}), (13) 
{y(fc)l 

where the infimum is taken over all output processes {y(k)} consistent with MSE< D and such that the 
reconstruction error {y(k) — x(fc)} is uncorrected with the source. More precisely, it is shown in lfT4l 
that this RDF, given by 

h Hd) = -L r log ( y ^M+° + V^m ^ (14a) 
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wherein a > is the only scalar that satisfies 




(14b) 



can be realized causally. 

More generally, it is known that, for any source, the mutual information across an AWGN channel 
(which satisfies ©) introducing noise with variance D, say Rawgn(D), exceeds Shannon's RDF R(D) 
by at most 0.5 bits/sample, see, e.g. iTTTTl . Thus, we have: 



Until now it has been an open question whether a bound tighter than (031 ) can be obtained for sources with 
memory and at general rate regimes ifTOl . In sections JVJIV] and EH we show that for for Gaussian sources 
this is indeed the case. But before focusing on upper bounds for R l ^(D), its operational importance will 
be established by showing in the following section that, for Gaussian sources, the OPTA does not exceed 
R l c(D) by more than approximately 0.254 bits/sample. 



In this section we show that, for any Gaussian source {x(/c)} and D > 0, an upper bound to R° p can 
be readily obtained from R l ^{D) by adding (approximately) 0.254 bits per sample to R^(D). This result 
is first formally stated and proved for finite subsequences of any Gaussian source. Then, it is extended 
to Gaussian stationary processes. 

We start with two definitions. 

Definition 4: The causal information theoretic RDF for a zero-mean Gaussian random vector of length 
£ is defined as 



R l c (D) < Rawgn(D) < R(D) + 0.5 bits/sample, VL> > 0. 



(15) 



III. Upper Bounds to R" p from R t 



R 



fW(D) = inff/(x;y) 



(16) 



where the infimum is taken over all output vectors satisfying the causality constraint 



y(fc)«x^4 +ll VA; = !,...,£-! 



(17) 



and the distortion constraint 




(18) 



A 
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Definition 5: The operational causal RDF for a zero-mean Gaussian random vector of length £ is 
defined as 

R°P(t)m)= inf r(x k ,y k ) (19) 

y?:y(fc)=/*(x fc ),Vfc=i,..,£ 

{/ fc } causal, 
d(x,y)<D. 

▲ 

We will also need the following result |[T4l Lemma 1]: 

Lemma 1: Let xEt'~ A/"(0, -RT x ). Let z G IR^ and G be two random vectors with zero mean 
and the same covariance matrix, i.e., K z = K ZG , and having the same cross-covariance matrix with 
respect to x, that is, K xz = K XZG . If zq and x are jointly Gaussian, and if z has any distribution, then 

J(x;x + z) > J(x;x + z G ). (20) 

If furthermore |1C X+Z | > 0, then equality is achieved in (|20l if and only if z ~ JV(0, K z ) with z and x 
being jointly Gaussian. ▲ 
Notice that if one applies Lemma Q] to a reconstruction error with which the output sequence satisfies the 
causality constraint (0]), then the Gaussian version of the same reconstruction error will also produce an 
output causally related with the input. More precisely, if a given reconstruction error zr satisfies (0]), then, 
for all j < k < i < N, it holds that = E [x(i) {y j — E [y J | x fc ] )] = E [x(i) (x? + 1? - E [x? + z j \ x fc ] )] = 
E [x(i) (z J — E [z J | x fc ] )] . Since Zg, x £ have the same joint second-order statistics as z e , x , it follows that 



E 



x(i)(y J G — E y 3 G \ x ) = 0, Vj < k < i < N. This, together with the fact that z G is jointly Gaussian 
with xr, implies that also the reconstructed sequence {yc(k)} — { x (^) + z g(^)} satisfies the causality 
constraint (§]). 

We are now in the position to state the first main result of this section: 

Lemma 2: For any zero-mean Gaussian random vector source of length £ having bounded differential 
entropy, and for every D > 0, 

R^ e \D) < ti*®{D) + - log 2 (2vre) bits/sample. (21) 

▲ 

The proof of Lemma [2] is presented in Section JX] 

The result stated in Lemma [2] for Gaussian random vector sources is extended to Gaussian stationary 
processes in the following theorem (the second main result of this section): 

Theorem 1: For a zero-mean Gaussian stationary source {x(fc)}, and D > 0, 

RT(D) < Rf(D) + ilog 2 (2^e). (22) 
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A 

The proof of Theorem Q] can be found in Section Ixl 

The fact that R^(D)+ (1/2) log 2 (27r e) > R° P (D) for Gaussian sources allows one to find upper bounds 
to the OPTA of causal codes by explicitly finding or upper bounding R l *(D). This is accomplished in 
the following sections. 

IV. Rc(D) for First-Order Gauss-Markov Processes 

In this section we will find R l *(D) when the source is a first-order Gauss-Markov process. More 
precisely, we will show that the information-theoretic causal RDF R l *(D), which is associated with an 
average distortion constraint, coincides with the expression for the SRDF on the RHS of (Q]) obtained 
in O for a per-sample distortion constraint. To do so, and to provide also a constructive method of 
realizing the SRDF as well as R l ^(D), we will start by stating an alternative derivation of the SRDF 
defined in l2l . 

Before proceeding, it will be convenient to introduce some additional notation. For any process (x(/c)}, 
we write x^, j < k, to denote the random column vector [x(j) • • • x(/c)] T and adopt the shorter notation 
Xfc = x(k). For any two random vectors x^,, y^, we define = E x^(xj[) T , K yl x j = E y i rn (^ k ) T ■ 

It was already stated in Lemma Q] that the reconstruction process y e which realizes mutual information 
for any given MSE distortion constraint, must be jointly Gaussian with the source. This holds in particular 
for a realization of the SRDF with distortion schedule D\, . . . , D^. In the next theorem we will obtain an 
explicit expression for this RDF and prove that in its realization, the sample distortions E[(y(fe) — x(fc)) 2 ] 
equal the effective distortions {dk} e k=1 , defined as 

di = min jcr^ , Dij (23a) 

4 = min jajfc-^fe-i) + o"f( fc _i) , D k \ , Vk = 2,...l. (23b) 

Moreover, it will be shown that the unique second-order statistics of this realization are given by the 
following recursive algorithm: 



May 3, 2011 



DRAFT 



12 



Procedure 1 

Step 0: Set E[yf] = E[y lXl ] = E[xf] - di. 
Step 1: Set the counter k = 2. 

Step 2: Set E[y£_ lXfc ] = K^^JK^J' 1 Efx^ Xfc ] 
Step 3: SetE[yi_ iyfe ] = E[y£_ 1 x fc ] 
Step 4: Set E[y*] = E[y fc x fc ] = E[^] - d k 

Step 5: Enlarge K y i to if y i by appending the column E[y^_ x y fc ] and the row E[y^.y fc ] T , 
calculated in steps 3 and 4. 
Step 6: SetEfy^xtxf] as 



EMxLif 



E[ylx fc ] T 



E[yLi*fc] 



Efy^xfe 



E[xi_ lXfc ] 2 



(24) 



Step 7: Put together Kyi ^^, E[y k x 1 k _ 1 ], E[y£_ lXfe ] and E[y fcXfc ] to obtain K y i x i. 
Step 8: Increment A; by 1 and go to Step 2. 



Figured] illustrates the operation of the above recursive procedure. After k— 1 iterations, the covariance 
sub-matrices K y i x i , -RT y i i have been found. At the A:-th iteration, step i is responsible of revealing 
the partial rows and columns indicated by number i in the figure. 



K i i 



V 




Figure 1. Illustration of the recursive Procedure 1 at its fc-th iteration. Starting from known covariance matrices if „i _i , 

y fc-i x fc-i 

if „i , their next partial rows 

»fc— i 

corresponding part of the matrix. 



if v i , their next partial rows and columns are found. The numbers indicate the step in the algorithm which reveals the 

y k— i 



The above results are formally stated in the following theorem, which also gives an exact expression 
for the SRDF of first-order Gauss-Markov sources. 
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Theorem 2: Let {x(/s)}|. =1 be a first-order Gauss-Markov source of the form 

x(fc + l) = a k x(k) + £(k), k = l,...,£-l, (25) 

where x(l) and the innovations {C(^)}fc=i are independent zero-mean Gaussian random variables with 
variances er 2 ,^ anc * { CT f(ifc)}fc=i' respectively. Then, the sequential rate distortion function (SRDF) for 
{x(fc)}| =1 under distortion schedule {D k Y k=1 is given by 

. . . , do- I m ( °M + 1 V' >» C ° L ' dt -' + ^-' l , (26, 



2£ /-^k=2 \ d k 

where the effective distortions {d k }{ =1 are defined in (|23T ). The unique second-order statistics of a 
realization of R l ^{D) for this source are obtained by the recursive algorithm described in Procedure 1. ▲ 
The proof of this theorem can be found in Section [XI] 

Remark 1: The expression for the SRDF with per-sample distortion constraints in (l26l ) differs from 
the one found in (2j p. 186] for the source (|25T ) with a k = a, VA; = 1, . . . ,£, which in our notation reads 

1 / a 2 A-i + ol 



R 



SRD 



( J D 1 ,...,A) = ^max<|o, \ log ( - ^ ) |> , (27) 



£=1 I 

wherein Dq = and <t|^ = ^(l) - ^ rie difference hes in that the logarithms in d26l ) contain the effective 
distortions {d k Y k=l , whereas (l27l ) uses the distortion constraints {Dfc}^ =1 themselves. It is likely that 
the author of (H, on page 186, intended these distortion constraints to be the effective distortions, i.e., 
that E[(y(/e) -x(£;)) 2 ] = D k , for every k = !,...,£. However, on Definition 5.3.5 on p. 147], 
the SRDF under a distortion schedule is defined as the infimum of a mutual information rate subject 
to the constraints E[(y(fc) — x(/c)) 2 ] < D k . Under the latter interpretation, nothing precludes one from 
choosing an arbitrarily large value for, say, D\, yielding an arbitrarily large value for the second term in 
the summation on the RHS of d271 ), which is, of course, inadequate. ▲ 

We are now in a position to find the expression for R^(D) for first-order Gauss-Markov sources. This 
is done in the following theorem, whose proof is contained in Section Kill 

Theorem 3: For a stationary Gaussian process 

x(k + l) = ax.(k)+£(k), fe = l,... (28) 

where {£(£;)} is an i.i.d. sequence of zero-mean Gaussian random variables with variance cr|, x(l) ~ 
N(0, cr|) with a 2 = — a 2 ), the information-theoretic causal RDF is given by 

R*(D) = ±ln(a 2 + ^J . (29) 
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A 

The technique applied to prove theorems |2] and [3] does not seem to be extendable to Gauss-Markov 
processes of order greater than 1. In the sequel, we will find upper bounds to R l ^(D) for arbitrary (any 
order) stationary Gaussian sources. 

V. Closed-Form Upper Bounds 

In order to upper bound the difference between R l J;(D) and R{D) for arbitrary stationary Gaussian 
sources, we will start this section by defining an upper bounding function for R^(D), denoted by i?**(D). 
We will then derive three closed-form upper bounding functions to the rate-loss Rf (D) — R(D) , applicable 
to any Gaussian stationary process. Two of these bounds are strictly smaller than 0.5 bit/sample for all 
distortions < D < a\. 

We begin with the following definition: 

Definition 6 (Causal Stationary RDF): For a stationary source {x(fc)}, the information-theoretic Causal 
Stationary Rate-Distortion Function R t ( f(D) is defined as 

l%(D)±w£I({x(k)};{y(k)}), 

where the infimum is over all processes {y(A;)} such that: 

i) d({x(k)},{y(k)})< D, 

ii) the reconstruction error {z(k)} = {y(k)} — {x(fc)} is jointly stationary with the source, and 

iii) Markov chain © holds. 

A 

Next we derive three closed-form upper bounding functions to R^(D) — R{D) that are applicable to 
arbitrary zero-mean stationary Gaussian sources with finite differential entropy rate. This result is stated 
in the following theorem, proved in Section IXIIH 

Theorem 4: Let {x(/c)} be a zero-mean Gaussian stationary source with PSD S^e 3 ^) with bounded 
differential entropy rate and variance a\. Let R(D) denote Shannon's RDF for {x(/c)} (given by (1121) ), 
and let R~ L (D) denote the quadratic Gaussian RDF for source-uncorrelated distortions for the source 
{x(k)} defined in (113b - Let R l J , (D) denote the information-theoretic causal RDF (see Definition^. Then, 
for all D G (0,o*), 

Ri\D) - R(D) < R?(D) - R(D) < B^D) < B 2 (D) < B 3 (D) < 0.5 bits/sample, (30) 
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where 



B 1 (D)±R ± (£^)-R(D) (31) 

82(D) 4 i- r lo g2 ( 1 + [1 - il^P") ^ " (32) 

^ 2 



Bs(D) 4 min | i log 2 ((1 + £) [l + - ^) £> 



, 0.5 , - log 2 I ^ ) }> , 03) 



1 /-TT 1 

?x " 7T" / r o , ■ NT dLJ, (34) 



where 



with e being any non-negative scalar with which (134b exists and such that e < D. k 

Notice that B^(D) is independent of R(D), being therefore numerically simpler to evaluate than the 
other bounding functions introduced in Theorem 01 However, as D is decreased away from a\ and 
approaches , B%(D) becomes very loose. In fact, it can be seen from d97a| ) that for D > the 
gap between R^(D) and R(D) is actually upper bounded by B^(D) — R(D), which is of course tighter 
than Bz(D), but requires one to evaluate R(D). 

It is easy to see that time-sharing between two causal realizations with distortions D\, D2 and 
rates R£(D\), R l *(D 2 ) yields an output process which satisfies causality with a rate-distortion pair 
corresponding to the linear combination of R l *(D{), Rf (D2). Thus, in some cases one could get a bound 
tighter than B% by considering the boundary of the convex hull of the region above R(D) + B^(D) and 
then subtracting R(D). However, such bound would be much more involved to compute, since it requires 
to evaluate not only R(D), but also the already mentioned convex hull. 

It is also worth noting that the first term within the min operator on the RHS of (l33l) becomes smaller 
when <j e — 1/a 2 . is reduced. This difference, which from Jensen's inequality is always non-negative, could 
be taken as a measure of the "non-flatness" of the PSD of {x(fc)} (specially when e = 0). Indeed, as 
{x(/c)} approaches a white process, B3 tends to zero. 

It can be seen from d30l) that R l *{D) provides the tightest upper bound for the information-theoretic 
RDF among all bounds presented so far. Although it does not seem to be feasible to obtain a closed-form 
expression for Rf(D), we show in the next section how to get arbitrarily close to it. 

VI. Obtaining R?(D) 

In this section we present an iterative procedure that allows one to calculate R^(D) with arbitrary 
accuracy, for any D > 0. In addition, we will see that this procedure yields a characterization of the 
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filters in a dithered feedback quantizer |fT3Tl that achieve an operational rate which is upper bounded by 
B%(D) + 0.254 [bits/sample]. 



A. An Equivalent Problem 

To derive the results mentioned above, we will work on a scheme consisting of an AWGN channel 
and a set of causal filters, as depicted in Fig. |2] In this scheme, the source {x(fc)} is Gaussian and 



x(fc) 



A{z) 



n(fc) 



->y(fe) 



F( Z ) +J 



Figure 2. AWGN channel within a "perfect reconstruction" system followed by causal de-noising filter W(z) 



stationary, with PSD S , x (e JU '), and is assumed to have finite differential entropy rate. In Fig. |2j the noise 
{n(fc)} is a zero-mean Gaussian process with i.i.d. samples, independent of {x(/c)}. Thus, between v(fc) 
and w(k) lies the AWGN channel w(k) = v(k) + n(fc). The filter is stable and strictly causal, 

i.e., it has at least a one sample delay. The filters A(z) and B{z) are causal and stable. The idea, to be 
developed in the remainder of this section, is to first show that with the filters that minimize the variance 
of the reconstruction error for a fixed ratio a\jo\, the system of Fig. |2] attains a mutual information 
rate between source and reconstruction equal to Rf(D), with a reconstruction MSE equal to D. We will 
then show that finding such filters is a convex optimization problem, which naturally suggests an iterative 
procedure to solve it. 

In order to analyze the system in Fig. |2l and for notational convenience, we define 

fi x (e^) 4 yjs x (ei»)\ Va; E [-vr, vr]. 

We also restrict the filters A{z) and B(z) to satisfy the "perfect reconstruction" condition 

A(e jul )B(e juJ ) = 1. (35) 

Thus, 

y(k) = W(z)x(k) + W(z)B(z)[l-F{z)]n{k), (36) 
see Fig. [2] Therefore, W(z) is the signal transfer function of the system. 
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The perfect reconstruction condition (1351) induces a division of roles in the system, which will later 
translate into a convenient parametrization of the optimization problem associated with it. On the one 
hand, because of 051 ), the net effect of the AWGN channel and the filters A(z), B(z) and F(z) is to 
introduce (coloured) Gaussian stationary additive noise, namely {u(/s)}, independent of the source. The 
PSD of this noise, 5 u (e JW ), is given by 

S u (e j ") = |W(^)| 2 |B(e*")| 2 |1 - F(e> u )\ 2 o*. (37) 

The diagram in Figure [3] shows how the signal transfer function W{z) and the noise transfer function 
W(z)B(z)(l — F(Z)) act upon {x(fc)} and {n(fc)} to yield the output process. 



n(fc)- 



W(z)B(z){l- F(z)) 



x(k) ► W(z 



u(fc) 



KjO *-y( fc ) 



Figure 3. Equivalent block diagram depicting the output as the sum of W(z)x(k) and u(k), where {n(fc)} is an i.i.d. zero-mean 
Gaussian process independent of {x(fc)}. 

On the other hand, by looking at Fig. |2] one can see that W{z) plays also the role of a de-noising 
filter, which can be utilized to reduce additive noise at the expense of introducing linear distortion. 
More precisely, W(z) acts upon the Gaussian stationary source {x(/c)} corrupted by additive Gaussian 
stationary noise with PSD ^(e^)) 2 |l - F(e ju )\ 2 a 2 n . From ([36]) and Fig. El the MSE is given by 

Dc 4 4 + \\(w - m x f = "^"!" | ^ / " 2 + \\(w - m x f, (38) 

where a 2 u 4 ^.J^ Su (e> w )doj and 

f(u)±\l-F(e? u )\, Vw€[-7r,7r]. 

On the RHS of (I38T ). the first term is the variance of the additive, source independent, Gaussian noise. 
The second term corresponds to the error due to linear distortion, that is, from the deviation of W(e^ U} ) 
from a unit gain. 

Since we will be interested in minimizing D c , for any given F(z) and W(z), the filters A(z) and 
B{z) in Fig. [2] are chosen so as to minimize a 2 , in (l38l) . while still satisfying (l35l) . From the viewpoint of 
the subsystem comprised of the filters A(z), B(z) and F(z) and the AWGN channel, W(z) acts as an 
error frequency weighting filter, see d3"7T ). Thus, for any F{z) and W(z), the filters A(z) and B{z) that 
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minimize a\ are those characterized in lfT5l Prop. 1], by setting P(z) in fI3] eq. (20b)] equal to W{z). 
With the minimizer filters in |[T5l . the variance of the source-independent error term is given by 

°u = — vTTv\2~ ■ (39) 



A" 

On the other hand, the filter F(z) needs to be strictly causal and stable. As a consequence, it holds that 

log f{uj)dui > 0, 

which follows from Jensen's formula [16] (see also the Bode Integral Theorem in, e.g., ifTTl ). 

Thus, from d38l ) and (|39l , if one wishes to minimize the reconstruction MSE by choosing appropriate 
causal filters in the system in Fig. |2] for a given value of A, one needs to solve the following optimization 
problem: 

Optimization Problem 1: For any given f^e- 7 ^), and for any given K > 1, find the frequency response 
W and the frequency response magnitude f(u) that 

Minimize: D c 4 + \\(W - l)O x || 2 (40a) 



A' 

Subject to: W G H, 

]n/(w)dw > 0, 

where H denotes the space of all frequency responses that can be realized with causal filters. ▲ 
Now we can establish the equivalence between solving Optimization Problem Q] and finding Rf(D). 
Lemma 3: For any K > 1 and O x (e- ?aj ), if the filters A*(z), B*(z), and F*(z) solve Optimization 

Problem Q] and yield distortion D*, then 

iln(A) = Af(^). 

A 

From the above lemma, whose proof can be found in Section IXIVI one can find Rf(D) either by 
solving the minimization in Definition [6] or by solving Optimization Problem [T] In the following, we 
will pursue the latter approach. As we shall see, our formulation of Optimization Problem Q] provides a 
convenient parametrization of its decision variables. In fact, it makes it possible to establish the convexity 
of the cost functional defined in (I40ab with respect to the set of all causal frequency responses involved. 
That result can be obtained directly from the following key lemma, proved in Section |XV| 
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Lemma 4: Define the sets of functions 

¥ K ^{f:[-7r,n]^R+,\\f\\ 2 <K}, 
G = {G: [-7T,7T] ->X}, 

where K is some positive constant. Then, for any GeG and K > 1, the cost functional ^ : F# x 
Mq", defined as 



is strictly convex in / and g. ▲ 
We can now prove the convexity of Optimization Problem [TJ 

Lemma 5: For all f2 x and for all K > 1, Optimization Problem Q] is convex . ▲ 
Proof: With the change of variables G = Q x and g = Q X W in (@T|), we obtain D c = ^f(f,g), 
see (l38l) . With this, Optimization Problem Q] amounts to finding the functions / and g that 

Minimize: J{f,g) (42a) 
Subject to: g G W, / £ B. (42b) 

where 

W = { 5 = O x W : G H} (43) 

B = ^f£F K :J ln/(w)dw = o|. 

Clearly, the space of frequency responses associated with causal transfer functions, H, is a convex set. 
This implies that W is a convex set. In addition, B is also a convex set, and from Lemma [4] ^f(f, g) is a 
convex functional. Therefore, the optimization problem stated in (l42l . and thus Optimization Problem [T] 
are convex. This completes the proof. ■ 

B. Finding R l J-(D) Numerically 

Lemma [5] and the parametrization in Optimization Problem \T\ allow one to define an iterative algorithm 
that, as will be shown later, yields the information-theoretic causal RDF. Such algorithm is embodied in 
iterative Procedure 2: 
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Iterative Procedure 2 

For any target information theoretical rate R, 
Step 1: Set K = 2 2R . 
Step 2: Set W(eP u ) = 1. 

Step 3: Find the frequency response magnitude f G B that minimizes D c for given W. 
Step 4: Find the causal frequency response W G H that minimizes D c for given f. 
Step 5: Return to step 3. 



Notice that after solving Step 3 in the first iteration of Procedure 2, the MSE is comprised of only 



additive noise independent of the source 



:Jj Step 4 then reduces the MSE by attenuating source-independent 
noise at the expense of introducing linear distortion. Each step reduces the MSE until a local (or global) 
minimum of the MSE is obtained. Based upon the convexity of Optimization Problem [T] the following 
theorem, which is the main technical result in this section, guarantees convergence to the global minimum 
of the MSE, say D, for a given end-to-end mutual information. Since all the filters in Optimization 
Problem Q] are causal, the mutual information achieved at this global minimum is equal to Rf(D). 

Theorem 5 ( Convergence of iterative Procedure 2 ): Iterative Procedure 2 converges monotonically to 
the unique / and W that realize R^(D). More precisely, letting denote the MSE obtained after the 
ra-th iteration of Iterative Procedure 2 aimed at a target rate R, we have that 

n 2 > ni ^ A (n2 > < A^" 1 ) 

and 

lim fif(A^) = R. 

A 

Proof: The result follows directly from the fact that Optimization Problem Q] is strictly convex in / 
and W, which was shown in Lemma @] and from Lemma [3] ■ 

4 Indeed, after solving Step 3 for the first time, the resulting rate is the quadratic Gaussian rate distortion function for source 
uncorrelated distortions _R X (D) introduced in 1141 (see also dl4t ) . 
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The above theorem states that the stationary information-theoretic causal RDF can be obtained by 
using Iterative Procedure 2. In practice, this means that an approximation arbitrarily close to Rf(D) for 
a given D can be obtained if sufficient iterations of the procedure are carried out. 

The feasibility of running Iterative Procedure 2 depends on being able to solve each of the minimization 
sub-problems involved in steps 3 and 4. We next show how these sub-problems can be solved. 

Solving Step 3 

If W(e J '" ; ) is given, the minimization problem in Step 3 of Iterative Procedure 2 is equivalent to 
solving a feedback quantizer design problem with the constraint A(z)B(z) = 1, Vz S C and with error 
weighting filter W(eP u ). Therefore, the solution to Step 3 is given in closed form by lfT31 eqs. (20), (29) 
and (31b)], where P(z) in [15, eq. (20b)] is replaced by W{z). The latter equations of |[T5l characterize 
the frequency response magnitudes of the optimal A(z), B(z) and 1 — F(z) given W(z). The existence 
of rational transfer functions A(z), B{z) and F(z) arbitrarily close (in an I? sense) to such frequency 
response magnitudes is also shown in fl31 . 

Solving Step 4 

Finding the causal frequency response W(eP u ) £ H that minimizes D c for a given / is equivalent to 
solving 



mm 



fU,9) (44) 

for a given /, where W is as defined in (l43l) . Since W and ^ {•■,•) aie convex, (l44l) is a convex optimization 
problem. As such, its global solution can always be found iteratively. In particular, if W{z) is constrained 
to be an M-th order FIR filter with impulse response c 6 such that W(e^ U} ) = T {c}, where 

denotes the discrete-time Fourier transform, then 

is a convex functional. The latter follows directly from the convexity of ^ {•,•) an d the linearity of 
As a consequence, one can solve the minimization problem in Step 4, to any degree of accuracy, by 
minimizing (c) over the values of the impulse response of W(e' UJ ), using standard convex optimization 
methods (see, e.g, |[T8l ). This approach also has the benefit of being amenable to numerical computation. 

It is interesting to note that if the order of the de-noising filter W(z) were not a priori restricted, then, 
after Iterative Procedure 2 has converged to Rf(D), the obtained W(z) is the causal Wiener filter (i.e., 
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the MMSE causal estimator) for the noisy signal that comes out of the perfect reconstruction system that 
precedes W(z). Notice also that one can get the system in Fig. [7] to yield a realization of Shannon's 
R(D) using Iterative Procedure 1 by simply allowing W(z) to be non-causal. This would yield a system 
equivalent to the one that was obtained analytically in ifTOl . An important observation is that one could 
not obtain a realization of R l ^{D) from such a system in one step by simply replacing W(z) (a non-causal 
Wiener filter) by the MMSE causal estimator (that is, a causal Wiener filter). To see this, it suffices to 
notice that, in doing so, the frequency response magnitude of W(z) would change. As a consequence, 
the previously matched filters A(z), B(z) and F{z) would no longer be optimal for W(z). One would 
then have to change A(z), and then W(z) again, and so on, thus having to carry out infinitely many 
recursive optimization steps. However, a causally truncated version of the non causal Wiener filter W(z) 
that realizes Shannon's RDF could be used as an alternative starting guess in Step 2 of the iterative 
procedure. 

C. Achieving Rf(D) + 0.254 bits/sample Causally 

If the AWGN channel in the system of Fig. [2] is replaced by a subtractively dithered uniform scalar 
quantizer (SDUSQ), as shown in Fig. 01 then instead of the noise {n(fc)} we will have an i.i.d. process 



v(k) -v{k) 




Figure 4. Uniform scalar quantizer Q and dither signals v(k), —v(k), forming an SDUSQ, replacing the AWGN channel of 
the system from Fig. [2] 

independent of (x(fc)}, whose samples are uniformly distributed over the quantization interval fl9l . 
The dither signal, denoted by {v(k)}, is an i.i.d. sequence of uniformly distributed random variables, 
independent of the source. Let {q(fe)} be the quantized output of the SDUSQ. Denote the resulting input 
and the output to the quantizer, before adding and after subtracting the dither, respectively, as {v'(k)} and 
{w'(k)}, and let {n'(k)} = {w'(k) — v'(/c)} be the quantization noise introduced by the SDUSQ. Notice 
that the elements of {n'(k)} are independent, both mutually and from the source {x(/c)}. However, unlike 
{v(fc)} and {w(k)}, the processes {v'(fc)} and {w'(k)} are not Gaussian, since they contain samples of 
the uniformly distributed process {n'(fe)}. We then have the following: 
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Theorem 6: If the scheme shown in Fig. [4] uses the filters yielded by Iterative Procedure 2, and if long 
sequences of the quantized output of this system are entropy coded conditioned to the dither values in a 
memoryless fashion, then an operational rate r° p satisfying 

r c °P<if( J D) + ilog 2 (27re) (45) 

is achieved causally while attaining a reconstruction MSE equal to D. ▲ 
Proof: If memoryless entropy coding is applied to long sequences of symbols conditioning the 
probabilities to dither values, then then operational rate equals the conditional entropy H(q(k)\v(k)). 
For this entropy, the following holds in the system shown in Fig. |4j 

H(q(k)\v(k)) = I(v'(k);w'(k)) = I(V(k);v'(k) + n'(fc)) = h(v'(k) + n'(fc)) - h(n'(k)) 

i h(v(k) + n(k)) - h(n(k)) + D(n'(k)\\ n{k)) - D(v'(k) + n'(ib)|| v(fc) + n(fc)) 

(46) 

< I(v(k); V (k) + n(fc)) + D(n'(k)\\ n(k)) = /(v(fc);w(fc)) + ±log 2 (^) 
= Ilog 2 K + ilog 2 (^) 

where i?(q(/c)|i/(A:)) denotes the entropy of q(k) conditioned to the k-th value of the dither signal. In 
the above, (a) follows from |[TT1 Theorem 1]. In turn, (6) stems from the well known result V(x' \ \ x) = 
h(x) — h(x'), where denotes the Kullback-Leibler distance, see, e.g., |fT3l p. 254]. The inequality 

in the last line of (l46l ) is strict since the distribution of \'{k) is not Gaussian. 

The result follows directly by combining d46l ) with Lemma [3] and Theorem [5] ■ 
In view of Theorem |6l and since any ED pair using an SDUSQ and LTI filters yields a reconstruction 
error jointly stationary with the source, it follows that the operational rate-distortion performance of 
the feedback quantizer thus obtained is within 0.5 log 2 (27re/12) ~ 0.254 bits/sample from the best 
performance achievable by any ED pair within this class. 

Remark 2: When the rate goes to infinity, so does K. In that limiting case, the transfer function 
W(z) tends to unity, and it follows from |[T5l that the optimal filters asymptotically satisfy ^(e- 5 "^)] = 
Sx(e^)- 1 , |-B(e^)| = Sx(ei u ), |l - F(e^)| = exp (^/M^S^e^))^) /S x (e> w ). Moreover, when 
K — > oo, the system of Fig. [4] achieves R° P (D) which, in this asymptotic regime, coincides with Rf(D) + 
0.51og 2 (27re), with R§(D) tending to R(D). ▲ 

D. Achieving R l *{D) + 1.254 bits/sample With Zero Delay 

If the requirement of zero-delay, which is stronger than that of causality, was to be satisfied, then it 
would not be possible to apply entropy coding to long sequences of quantized samples. This would entail 
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an excess bit-rate not greater than 1 bit per sample, see, e.g., lfT3l Section 5.4]. Consequently, we have 
the following result: 

Theorem 7: The OPTA of zero-delay codes, say R°^ D (D), can be upper bounded by the operational 
rate of the scheme of Fig. [4] when each quantized output value is entropy-coded independently, conditioned 
to the current dither value. Thus 

R°I D (D) < Rf (D) + - In I — J + 1 ~ Rg(D) + 0.254 + 1 bits/sample. (47) 

A 

The 0.254 bits per sample in (l47l ). commonly referred to as the "space-filling loss" of scalar quantiza- 
tion, can be reduced by using vector quantization ifTTTl . GUI . Vector quantization could be applied while 
preserving causality (and without introducing delay) if the samples of the source were iV-dimensional 
vectors. This would also allow for the use of entropy coding over iV-dimensional vectors of quantized 
samples, which reduces the extra 1 bit/sample at the end of Wh to 1/N bits/sample, see |[T3l Theo- 
rem 5.4.2]. 

E. The Additive Rate Loss of Causality Arises from Two Factors 

It is worth noting that Lemma [3] and the above analysis reveals an interesting fact: the rate loss due to 
causality for Gaussian sources with memory, that is, the difference between the OPTA of causal codes 
and R(D), is upper bounded by the sum of two terms. The first term is 0.254 bits/sample, and results 
from the space filling loss associated with scalar quantization, as was also pointed out in O for the 
high resolution situation. This term is associated only with the encoder. For a scalar Gaussian stationary 
source, such excess rate can only be avoided by jointly quantizing blocks of consecutive source samples 
(vector quantization), i.e., by allowing for non-causal encoding (or by encoding several parallel sources). 
The second term can be attributed to the reduced de-noising capabilities of causal filters, compared to 
those of non-causal (or smoothing) filters. The contribution of the causal filtering aspect to the total 
rate-loss is indeed Rf(D) — R(D). This latter gap can also be associated with the performance loss of 
causal decoding. 

As a final remark, we note that the architecture of Fig. |2j which allowed us to pose the search of 
Rf(D) as a convex optimization problem, is by no means the only scheme capable of achieving the 
upper bounds (l46l ) and (l47l) . For instance, it can be shown that the same performance can be attained 
removing either A(z) or F(z) in the system of Fig. El provided an entropy coder with infinite memory 
is used. Indeed, the theoretical optimality (among causal codes) of the differential pulse code modulation 
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(DPCM) architecture, with predictive feedback and causal MMSE estimation at the decoding end, has 
been shown in a different setting ll2Tll . 



VII. Example 

To illustrate the upper bounds presented in the previous sections, we here evaluate B\(D), B2(D), and 
B$(D), and calculate an approximation of R^(D) via Iterative Procedure 2, for two Gaussian zero-mean 
AR-1 and AR-2 sources. These sources were generated by the recursion 



x(k) = ai x(fc - 1) + a 2 x(k - 2) + z(k), Vfc € Z, 



(48) 



where the elements of the process {z(A;)} are i.i.d. zero-mean unit-variance Gaussian random variables. 

Iterative Procedure 2 was carried out by restricting W(z) to be an 8-tap FIR filter. For each of the 
target rates considered, the procedure was stopped after four complete iterations. 

The first-order source (Source 1) was chosen by setting the values of the coefficients in (l48l to be 
a\ = 0.9, ci2 = 0. This amounts to zero-mean, unit variance white Gaussian noise filtered through the 
colouring transfer function z/(z — 0.9). The second-order source (Source 2) consisted of zero-mean, unit 
variance white Gaussian noise filtered through the colouring transfer function z 2 /[(z — 0.9) (z — 0.1)]. 
The resulting upper bounds for Source 1 and Source 2 are shown in Figs. [5] and [6l respectively. As 
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Figure 5. R(D) (in bits/sample) and several upper bounding functions for R^(D) for zero-mean unit variance white Gaussian 
noise filtered through z/(z — 0.9). The resulting source variance is 5.26. 



predicted by (l90l ) and (|33T ). all the upper bounds for R^(D) derived in Section Ivl converge to R(D) in 
the limit of both large and small distortions (that is, when D — » a\ and D — > + , respectively). 
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Figure 6. R(D) (in bits/sample) and several upper bounding functions for (D) for zero-mean unit variance white Gaussian 
noise filtered through z' 2 /[(z — 0.9)(z — 0.1)]. The resulting source variance is 6.37. 

For both sources, the gap between Rf(D) and R(D) is significantly smaller than 0.5 bits/sample, for 
all rates at which Rf(D) was evaluated. Indeed, this gap is smaller than 0.22 bit/sample for both sources. 

For the first-order source, the magnitude of the coefficients of the FIR filter W{z) obtained decays 
rapidly with coefficient index. For example, when running five cycles of Iterative Procedure 2, using a 
10th order FIR filter for W(z), for Source 1 at R = 0.2601 bits/sample, the obtained W(z) was 

W{z) = 0.3027 + 0.1899Z" 1 + 0.1192z~ 2 + 0.0748z~ 3 + 0.0470z" 4 + 0.0296z~ 5 + 0.0188z~ 6 

+ 0.0123z~ 7 + 0.0086z~ 8 + 0.0070z~ 9 

Such fast decay of the impulse response of W(z) suggests that, at least for AR-1 sources, there is little 
to be gained by letting W(z) be an FIR filter of larger order. (It is worth noting that, in the iterative 
procedure, the initial guess for W(z) is a unit scalar gain.) The frequency response magnitude of W(z) 
is plotted in Fig. |7J together with O x (e J£J ) and the resulting frequency response magnitude |l — F(e' UJ )\ 
after four iterations on Source 1 for a target rate of Rj?(D) = 0.2601 bits/sample. 

Notice that for Source 1, after four iterations of Iterative Procedure 1, the obtained values for Rf(D) 
are almost identical to R l *{D), evaluated according to d29b . This suggests that Iterative Procedure 2 has 
fast convergence. For example, when applying four iterations of Iterative Procedure 2 to Source 1 with a 
target rate of 0.2601 bits/sample, the distortions obtained after each iteration were 1.6565, 1.6026, 1.6023 
and 1.6023, respectively. For the same source with a target rate of 0.0441 bits/sample, the distortion took 
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to, [rad/s] 

Figure 7. fi x (e-"* ; ), |l — f (e^ w ) and |W(e-'")| of an approximate realization of Rj?(D) for a Gaussian stationary source with 
PSD 1 1/(1 - 0.9e- J ")| when the rate is 0.2601 [bit/sample], using the system shown in Fig. [2] These frequency responses 
were obtained after four iterations of Iterative Procedure 1, with filter W(z) being FIR with 8 taps. 

the values 4.0152, 3.9783, 3.9783, and 3.9782 as the iterations proceeded. A similar behaviour is observed 
for other target rates, and for other choices of a\ in d48l ) as well. Thus, at least for AR-1 sources, one 
gets close to the global optimum Rf(D) after just three iterations. 

VIII. Conclusions 

In this paper we have obtained expressions and upper bounds to the causal and zero-delay rate 
distortion function for Gaussian stationary sources and MSE as the distortion measure. We first showed 
that for Gaussian sources with bounded differential entropy rate, the causal OPTA does not exceed the 
information-theoretic RDF by more than approximately 0.254 bits/sample. After that, we derived an 
explicit expression for the information-theoretic RDF under per-sample MSE distortion constraints using 
a constructive method. This result was then utilized for obtaining a closed-form formula for the causal 
information-theoretic RDF Rf(D) of first-order Gauss-Markov sources under an average MSE distortion 
constraint. 

We then derived three closed-form upper bounding functions to the difference between R^(D) and 
Shannon's RDF. Two of these bounding functions are tighter than the previously best known bound of 
0.5 bits/sample, at all rates. We also provided a tighter fourth upper bound to R l c t(D), named Rf(D), 
that is constructive. More precisely, we provide a practical scheme that attains this bound, based on a 
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noise-shaped predictive coder consisting of an AWGN channel surrounded by pre-, post-, and feedback 
filters. For a given source spectral density and desired distortion, the design of the filters is convex 
in their frequency responses. We proposed an iterative algorithm, which is guaranteed to converge to 
the optimal set of unique filters. Moreover, the mutual information obtained across the AWGN channel, 
converges monotonically to Rjf(D). Thus, one avoids having to solve the more complicated minimization 
of the mutual information over all possible conditional distributions satisfying the distortion constraint. To 
achieve the upper bounds on the operational coding rates, one may simply replace the AWGN channel by 
a subtractively-dithered scalar quantizer and using memoryless entropy coding conditioned to the dither 
values. 

IX. Proof of Lemma [2] 

We will first show that Rc W (D) can be realized by a vector AWGN channel between two square 
matrices. It was already established in Lemma Q] that an output y corresponds to a realization of R 1 ^ 
only if it is jointly Gaussian with the source x. From this Gaussianity condition, the MMSE estimator 
of y from x, say y, is given by 

y = KyxK- 1 *, (49) 

where the inverse of K x exists from the fact that x has bounded differential entropy. It is clear from d49l ) 
and the joint Gaussianity between x and y that the causality condition is satisfied if and only if the 
matrix 

KyxK" 1 is lower triangular. (50) 
On the other hand, the distortion constraint (fTST ) can be expressed as 

\ tr{E[(y - x)(y - x) T ] } = \ tr{K y - K yx - (K yx ) T + K x } < D (51) 

it(£) 

From the definition of R c , for every e > 0, there exists an output vector y jointly Gaussian with x 
such that K y and K yyi satisfy (l50l) . (IBTI ) and 

jI(x;y)<R*W+e. (52) 

We will now describe a simple scheme which is capable of reproducing the joint statistics between x 
and any given y jointly Gaussian with x satisfying (l50l . (IBTI ) and (l52l . 

Suppose x is first multiplied by a matrix A £ M. exe yielding the random vector v = Ax. Then a 
vector with Gaussian i.i.d. entries with unit variance, independent from x, say n G ]R , is added to v, to 
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yield the random vector w = v + n. Finally, this result is multiplied by a matrix B G M. £x£ to yield the 
output 

y = BAx. + Bn. (53) 

On the other hand, the joint second-order statistics between y and x are fully characterized by the matrices 

K yx = E[yx T ]=BAK x (54) 

K y = E[yy T ] = BAK X (BA) T + BB T . (55) 

It can be seen from these equations that all that is needed for the system described above to reproduce 
any given pair of covariance matrices K y , K yx is that the matrices A and B satisfy 

BA = K yx K x l (56) 
BB T = M = K y - K yx K^ l K xy (57) 

Thus, B can be chosen, for example, as the lower-triangular matrix in a Cholesky factorization of M. 
With this, a tentative solution for A could be obtained as A = B^ K yx K~ 1 , which would satisfy (|56l ) 
if and only if BB^ K yx K~ 1 = K yx K x l . The latter holds if and only if span{^ yx } C span{i?} 
(recall that K x is non-singular since x has bounded differential entropy). We will now show that this 
condition actually holds by using a contradiction argument. Suppose span{X yx } ^ span{£?}. Since 
span{£?} = span{ M"}, the former supposition is equivalent to span{i^ yx } ^ span{ M"}. If this were 
the case, then there would exist s G IR such that s T K yx / and s T M = 0. The latter, combined 
with (1571 ), would imply s T K y / 0. One could then construct the scalar random variable r = s T y, which 
would have non-zero variance. The MSE of predicting r from x is given by 

K, - K xx K x l K T , x = s T {Ky - Ky^K^K^s = s T Ms = 0. 

From this, and in view of the fact that r is Gaussian with non-zero variance, we conclude that /(x; r) 
would be unbounded. However, by construction, the Markov chain r -H- y -H- x holds, and therefore by 
the Data Processing Inequality we would have that /(x; y) > /(x; r), implying that /(x; y) is unbounded 
too. This contradicts the assumption that y is a realization of B^^\d), leading to the conclusion that 
span{X yx } C span{i?}. Therefore, the choice 

A = B^Ky^K^ 1 (58) 

is guaranteed to satisfy d56l ), and thus for every e > 0, there exist matrices B and A which yield an 
output vector satisfying d50l ), (|5TT > and (|52l ). 
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On the other hand, we have that 

/(x;y) = /(v;y)=/(v;w) (59) 

The first equality follows from the data-processing inequality and the fact that v is obtained determin- 
istically from x. The second equality stems from (l58l) . which implies that span{ A} n M{B} = 0. The 
latter means that B is invertible along all the directions in which v has energy, which together with the 
fact that n is i.i.d. and independent of v implies /i(v|y) = /i(v|w). Therefore, if A and B yield an 
output y such that (l/£)I(x;y) < Rc {t) (D) + e, then £l(v;w) < Rl m {D) + e. 

Finally, if we keep the A and B satisfying the above conditions and replace the noise n by the vector 
of noise samples m with unit variance introduced by I independently operating subtractively-dithered 
uniform scalar quantizers (SDUQS) ifTTl . with their outputs being jointly entropy-coded conditioned to 
the dither, then the operational data rate r(x, y) = E[Lg(x)] would be upper bounded by ifTTTl 

r (x, y ) < J(v; u) + - < +7(v; w) + - log 2 (2vr e) + - 

where u = v + m is the output of the ECDQ channel. Since the distortion yielded by the SDUQs is the 
same as that obtained with the original Gaussian channel, we conclude that 

R o P (e) < R m ^ + i log2 ( 27r e ^ + i + £ bits/sample . 

Given that the above holds for any e > and since R° p{e \D) is defined as an infimum, we conclude 
that R° c p{e) (D) < Ri t{e) (D) + \ log 2 (2vre) + \, which completes the proof. □ 

X. Proof of TheoremQ] 

We will start by showing that 

R°P(D) = limsup R^(D). (60) 

I— >oo 

First, following exactly the same proof as in Lemma [6] in the Appendix, it is straightforward to show 
that 

R° P (D) > limsup R° p W(D). (61) 

l— >oo 

Now, consider the following family of encoding/decoding schemes. For some positive integer I, the entire 
source sequence is encoded in blocks of I contiguous samples. Encoding and decoding of each block is 
independent of the encoding and decoding of any other block. As in the scheme described in the second 
part of the proof of Lemma [2l each block is encoded and decoded utilizing I parallel and independent 
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SDUSQs, with their outputs jointly entropy coded conditioned to the dither values, and using with the 
optimal pre- and post-processing matrices. For such an ED pair, and from ((5]), the operational rate after 
k samples have been reconstructed is 

~k~ 



r(x fc ,y fc ) = - 



r opW (£,) < R o P (i) (jD) + i R o P (e) (62) 



where [•] denotes rounding to the nearest larger integer (since the /c-th sample is reconstructed only after 
\k/f\ blocks of length £ are decoded). On the other hand, since the variance of each reconstruction error 
sample cannot be larger than the variance of the source, we have that the average distortion associated 
with the first k samples is upper bounded as 

d(* k ,Y k ) < + k ~ al < D + lo*, (63) 

where |_-J denotes rounding to the nearest smaller integer. Therefore, for any finite £, the average distortion 
of this scheme equals D when k — > oo (i.e., when we consider the entire source process). Also, from (l62l) 
and ([5]), letting k — > oo we conclude that 

R° P (D) < E%W{D). (64) 
If limsup^ R° p ^ £ \d) exists, then, for every e > 0, there exists a finite £o(e) G N such that 

R° p W(D) < limsup R° c p( - e) (D) +e, V£ > £ (e) (65) 

e 

Therefore, every e > 0, there exists a finite £o(e) G N such that 

R op (D) < \im sup R^ e \D) +e, W > £ {e) (66) 

Since R° P (D) is defined as an infimum among all causal codes (which, in particular, means £ can be 
chosen larger than ^o(^) f° r an y £ > 0), it readily follows from (I6TT ). (l66l ). Lemma [2] and Lemma [71 that 

R op (D) = limsupiC W (D) < limsup4'W(D) + ^log 2 (27re) < B%{D) + \\og 2 {2w), 

completing the proof. □ 

XI. Proof of Theorem E] 

From Lemma \T\ for any given reconstruction-error covariance matrix, the mutual information is 
minimized if and only if the output is jointly Gaussian with the source. In addition, for any given 
mutual information between and a jointly Gaussian output y e , the variance of every reconstruction 
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error sample z(k) = y(k) — x(k) is minimized if and only if z(k) is the estimation error resulting from 
estimating x(k) from y k , that is, if and only if 

= E[z fc yi] =E[(y fc -x fc )yi] , Vk = l,...,£, (67) 

which for Gaussian vectors implies z(fc) and y h are independent, and therefore 

h(z(k)\y k ) = h(z(k)), Vfc = l,...,£ (68) 

Thus, hereafter we restrict the analysis to output processes jointly Gaussian with and causally related to 
x e which also satisfy d67l ). For any such output process, say, y E , the following holds: 

= l £ T! k=1 I (^y(k)\y k - 1 ) (69) 

= ^™ + 7 EL M)i y*- 1 ) - 



'fe=2 

h(x(l)) - /»(z(l)) 

fe=2 



1 ^ 



(71) 
(72) 



^ In ) + - £ Y! k=2 H-a k -i <k - 1) + £(* ~ 1)) - *(z(fc))] (73) 



2r 

In 



In the above, d69l follows because y £ depends causally upon x £ . In turn, inequality dTUl ) is due to the 
fact that I(x fc ;y(A ; )| y*" 1 ) = h(y(k)\ y k ~ l ) - h{y(k)\ y k ~\x k ) > h(y(k)\ y*" 1 ) - h{y(k)\ y k ~\x{k)), 
and thus equality holds in d70l ) if and only if the following Markov chain is satisfied: 

y(fc)o{x(A : ),y fc - 1 }ox fc - 1 , Vfc = l,...,£ (75) 

Finally, (1721 and (1731 follow because y l satisfies (|68l ) for all A; = 1, . . . , I. 

Thus, the mutual information I(x i ; y e ) of every output y £ that is a candidate to constitute a realization 
of Rf RD {D\, . . . ,Dg) is lower bounded by the RHS of (f74l ), which in turn depends only on the error 
variances {Cx(fc)}fc=i associated with y^. We shall now see that this lower bound is minimized by a 
unique set of error variances, and then show that the resulting bound is achievable while having these 
error variances. 
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Revisiting {ZD © and ©, we have that (1/2) ln([Ofc-i°£(fc-:i) + <r| (fe _ 1) ]/a2 (fe) ) = fc(x(fc)| y^ 1 ) - 
/i(x(/c)|y fc ) > and (1/2) In (o^m/ofm) = /i(x(l)) - h(x(l)\ y(l)) > 0. Therefore, in a realization of 
R S C RD {D 1 , Dg), it holds that 

o 7 2 m < (76a) 



o^ < a|-i0^ fc _n + Op rfc _i ^ = o? fM - _ , Vk = 2,...£. (76b) 



'z(l) ^ "x(l) 
2 ^2 

7 z(fc) - °fc-l u z(fc-l) "£(fc-l) — "x(fc) ~~ u k-l u y(k-l)' 

With this, and since the right-hand side of (1741 ) decreases when any error variance increases, the 
minimum value of the right-hand side of (1741 subject to the constraints 

o z 2 (fc) < D k , k = 1, . . . ,£ (77) 

is attained when these variances satisfy = dk, for k = 1, . . . ,£ (see (123T)). Therefore, for all outputs 
y causally related to and jointly Gaussian with x £ satisfying the distortion constraints, it holds that 



with equality if and only if y satisfies (167I ). (1751) and (1771) . 

Now we will show that for any distortion schedule {Df.}t =1 , the output y e yielded by the recursive 
algorithm of Procedure 1 is such that I(x^;y £ ) equals the lower bound (1781) . thus being a realization of 
R$ RD (D 1 ,...,D e ). 

We will first demonstrate that {y(k)} satisfies the causality Markov chain 

yfc0x i 0x fc+i y keN (79) 

and the conditions (I67T ) (MMSE), and (T75T ) (Source's Past Independence) which are necessary and sufficient 
to attain equality in d78l ). 

Causality condition d79"l )." Let A = Kyi^i (iiC x i) _1 . Suppose y^,_ 1 satisfies causality. Then, since 
K" y i x i = AK x i, it follows from d50j) that the top-left square submatrix A fc ~ lj G R( fc - 1 ) x ( fc - 1 ) of A is 
lower triangular, being given by 

A k ^ = K yl _^{K A J-\ (80) 
Then Step 2 of the algorithm is equivalent to 

E[ytix fc ] =A fc -^E[xt!X fc ]. (81) 

This means that the top (k — 1) entries in the fc-th column of i*C y i x i depend only on the entries of 
K„i above its k-th row. Recalling that K v i„i = AK„i, we conclude that A is also lower triangular, 
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and thus y£ also satisfies causality. Notice that for any given K x i i and .Kyi x i i satisfying causality 
up to sample k — 1, the vector E[y],_ 1 Xfc] yielded by Step 2 is the only vector consistent with x k ,y k 
satisfying causality up to the £>th sample. 

MMSE Condition (loTT ): Step 1 guarantees that (I67T ) is satisfied for k = 1. Steps 3, 4 and 5 mean 
that E[y^y fc ] = E[y^.Xfc] for all k = 2, . . . ,£. Therefore, the reconstruction vector yj yielded by the 
above algorithm satisfies (I67T ) for all k = 1, . . . ,£. 

Source's past independence (T75T ).- Since all variables are jointly Gaussian, condition (l75l ) is equivalent 

to 



E[(y fc -E[y,|x fc ,yt 1 ])(xt 1 ) T ] =0, (82) 
for all k = 1, . . . , £. On the other hand, 

TP r i i i T^r u a \t n ( K yl-i E [ x *yiLiA y\-i 

\E[x k yU] T E[x|] ) [ x fc _ 
From steps 1, 3 and 4 it follows that E[y fc [(y^_ 1 ) T x fe ]] = E[y fc (y£) T ] = E[x fc (y£) T ]. Substitution of 
this into (|83]> and the result into ([82]> leads directly to (|24]>- Thus, (|75]> is satisfied for all k = 1, ... ,£. 

Since the above algorithm yields an output which satisfies (f79T >. (I67T ) and (T75T ). for all k = 1, . . . ,£, 
this output attains equality in ( |78T ), thus being a realization of Rf RD (Di,...,Di). Notice that once 
the distortions {dkYk=i are §i ven > eacn ste P i n ^ e recursive algorithm yields the only variances and 
covariances that satisfy (|79l ), (I67T ) and (|75T ). Therefore, for any given distortion schedule {Dfe}^. =1 , the 
latter algorithm yields the unique output that realizes Rf RD (Di, . . . , D(). This completes the proof. □ 



XII. Proof of Theorem [3] 
Consider the first £ samples of input and output. The average distortion constraint here takes the form 



1 1 
7E 



k=l 



Then, 



(84) 



R^(D) = inf ±J(*V) 

y* :j79} and fH hold 1 



inf 

{z(fc)}l =1 :(13 holds 21 



> inf 

{z(fc)}i =1 :(84) holds 



inf R 

{z(fc)}i =1 :il) holds 



SRD/2 



11 Vim) 2 < z ^*- 1 I 



(°z(l)> • • • >°z(£)) 



a 



z(fc) 




, ( 2^ 



07 



£-1 Z^fc=l °z| 



z(fc). 



(85) 
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where the last inequality follows from Jensen's inequality and the fact that ln(a 2 + — ) is a convex 
function of x. Equality is achieved if and only if all distortions ff(fc) e 1 ua l some common value for all 
k = 1, ...,(£ — 1). Given that the RHS of (185T ) is minimized when constraint (l84l ) is active (i.e., by 
making j Ylk=i a z(k) = we can attam equality in (I85T ) and minimize its RHS by picking 

iD ~ a lm 

°\ k) = t _ x { \ Vfc€{l,2,...,*-1}. (86) 
For this choice to be feasible, the distortion ^(fe) must satisfy (f76T >. which translates into the constraint 

Thus, substituting (l86l ) into ([85T ). we obtain 

In view of (I87T ). as £ — >• oo, the value of c 2 .^ that infimizes (I88T ) remains bounded. Therefore, 

lim (D) = max { ^ In ( a 2 + ^$ | 1 (89) 
e^oo 2 \ D J \ 

Finally, from Lemma [7] in the Appendix, we conclude that R % ^{D) equals the RHS of (l89l . completing 

the proof. □ 

XIII. Proof of Theorem H] 

The first inequality in (|30T ) follows directly from definitions [3] and [6l For a plain AWGN channel with 
noise variance d, the mutual information between source and reconstruction is 

RAWGN{d) = y log 2 + - ^ ciw. 

On the other hand, by definition, the mutual information across a test channel that realizes R ± (D) with 
distortion D = d satisfies fffl : 

R ± (d) < R AW GN(d). 

In both cases the end-to-end distortion can be reduced by placing a scalar gain after the test channel. 
The optimal (minimum MSE) gain is ^feg - The mutual information from the source to the signal before 
the scalar gain is the same as that between the source an the signal after it. However, now the resulting 
end-to-end distortion is D = ^feg . Therefore, for a given end-to-end distortion D, the distortion between 
the source and the signal before the optimal scalar gain is 

ol-D' 
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which implies that the mutual informations across the R channel and the AWGN channel when the 
optimal scalar gain is used are given by R L ( ) and Rawgn{ ^ D d ), respectively. We then have that 



R*(D) - R(D) < RH^b) - R(D) = B 2 {D) 

2 1 f' n ( 9 fe-^l \ 

< Rawgn(^d) - R(D) = — / log 2 1 + du - R(D) 

= + B^jP-) dU ~ R M = (90a) 



To obtain the first function within the min operator on the RHS of (|33l ), we notice from ([121 that, 
since e < D < 9, the RDF for a Gaussian stationary source with PSD S^i^) = max{e,5 x (e 3U )}, 
Vw G [— vr,7r], say R e (-), will equal the value 22(23) given by (|12ab when the "water level" # takes the 
same value as in (fl"2l ). Hence, denoting by 23 e the distortion obtained in ([T2l when 5 X is substituted by 
we find that 

2? e (23 £ ) = fl(23) <=^ D £ = j min {0, S^e 7 '")} < 23 + e. (91) 

— 7T 

On the other hand, 

7T 

ie{IJF) > i- / log 2 f d w (92) 



4vr y oz V -° 

— 7T 

With this, and starting from d90ab , we have the following: 



2^(23) - 2?(23) < — / log 2 I 1 + [1 - ij]^^ ) ^ " R(D) 



< — 



j og2 ^i + [i - ^^-^ j^-^y io g2 [-^ ) ^ 

— 7T 

_L f i og2 f + [1 - 4] — 1 (94) 

1 r , / 23 + e . rj Dl 23 + e 



<— / logo — -— + [1-41 — dw (95) 

< \ log 2 f (23 + e )£ + [1 - , (96) 



where <g3) follows from (ED and and by noting that S*(e*") > S x (e^), Vio G [-tt.tt], (g5) 
stems from (l9ll . and d96l) follows from Jensen's inequality. Notice that the RHS of d96l) equals the first 
term on the RHS of (l33l). 
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The middle term on the RHS of (l33l) follows directly from (fT5T ). Finally, for distortions close to o\, a 
bound tighter than (l96l ) can be obtained from (I90ab as follows 

if p) - 2?(D) < B 3 (D) = ±£ log 2 (l + k '~% x(e?a,) ) dw - fl(JD) 

aio g2 ( 1+ ^)4i„ g2 (§), 

which is precisely the third term on the RHS of (|33T ). In the above, d97a|) holds trivially since -R(D) > 
0, VI? < a\, and (I97bl ) follows from Jensen's inequality. Therefore, equality holds in (I97bl ) if and only 
if {x(fc)} is white. The validity of the chain of inequalities in (l30l follows directly from (l90l) and (l97l ). 
This completes the proof. □ 

XIV. Proof of Lemma [3] 
The idea of the proof is to first show that if the distortion D c equals D > 0, then 

i \n{K) = /(v(fc); w(fc)) > J({x(fc)} ; {y(fc)» > ^(D). (98) 

Immediately afterward we prove that, despite the distortion and causality constraints, the scheme in 
Fig. |2] has enough degrees of freedom to turn all the above inequalities into equalities. That means that 
if we are able to globally infimize K over the filters of the system while satisfying the distortion and 
causality constraints, then that infimum, say K in f, must satisfy (1/2) ln(i^j n ^) = R l ^(D c ). 

We now proceed to demonstrate the validity of d98l) and to state the conditions under which equalities 
are achieved. The first equality in d98l) follows from the fact that {n(/c)} is a Gaussian i.i.d. process. 
Inequality (a) stems from the following: 

I(v(k); w(fc)) = h(w{k)) - h(w{k)\ v(fc)) = h{w(k)) - h{v(k) + n(k)\ v(fc)) 
= h(w(k)) - h(n(k)\v{k)) 



= h(w(k)) - h(n(k)) 


(99) 


> h(w(k)\w k ~ l ) - h(n(k)) 


(100) 


= h({w(k)}) - h(n(k)\ n k ~ 1 ) 


(101) 


= h({w{k)}) - h(n(k)\n k -\v k ) 


(102) 


= h({w(k)})-h(w(k)\w k -\v k ) 


(103) 
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h({w(k)}) - h(w(k)\w k -\i k ) 



(104) 



h({w(k)}) - h(w(k)\w k -\Z°°) 



(105) 



I({x(fc)} ; {w(*)» 



>I({x(fc)};{y(fe)}) 



(106) 



where {x(/c)} is the signal at the output of A{z), see Fig. |2] In the above, (l99l ) follows from the fact 
that {n(/c)} and {x(fc)} are independent and from the fact that F(z) is strictly causal. As a consequence, 
n(k) is independent of v(k), for all k £ Z + . Inequality (1 1001 ) holds from the property < 
with equality if and only if x and y are independent, i.e., if and only if {w(k)} is white. Similarly, (1 1 1 b 
holds since the samples of {n(fc)} are independent. By noting that v fc is a linear combination of x fc and 
n fc_1 , it follows immediately that n(k) is independent from v k upon knowledge of n fc_1 , which leads 
to d 102b - On the other hand, (11031 ) stems from the fact that w k = n k + v fc . Equality in (11041 ) holds from 
the fact that, if w fc_1 is known, then x fc can be obtained deterministically from v fe_1 , and vice-versa, 
see Fig. [2] Equality (11051 ) follows from the fact that there exists no feedback from {w(fc)} to {x(£;)}, 
and thus the Markov chain xS_ 1 <H> (x fc , w^" 1 ) f-> w(£;) holds. On the other hand, 7({x(/c)} ; {w(k)}) > 
I({x(k)} ; {y(k)}), with equality if and only if B(e'' jj ) is invertible for all frequencies co for which 
(^(e^)! > 0. Finally, (11061 ) follows directly from the Data Processing Inequality, with equality if and 
only if S(e Ja; ) is invertible for all frequencies ui for which |j4(e JW )| > 0. 

Since R^D) is by definition an infimum, it follows that, for every e > 0, there exists an output 
process {y'(A;)} jointly Gaussian with {x(/c)}, satisfying the causality and distortion constraints and such 
that 7({x(fc)} ; {y'(A;)}) < R^{D) + e. Such output can be characterized by its noise PSD, say S' u , and 
its signal transfer function, say W'(z), by using the model in Fig. [3] 

Therefore, all that is needed for the system in Fig. [2] to achieve 



is to yield the required noise PSD S' a , the required signal transfer function W'(z), a white {w(k)} and 
satisfy -B(e Ja; ) ^ 0, \/w : A(e : ' UJ ) / 0. To summarize and to restate the latter more precisely: 



- ln(K) = I({x(k)} ; {y'(fc)}) < B$(D) + * 



(107) 



Equality in (51) 4= S w (e> w ) = 1 = \A(e juJ )\ 2 S x (e juJ ) + |l - F(e^)| 2 a\ 



(108a) 



Equality in (new) (fT06l) &B(e? u ) £ 0, Vw : A(e^) / 



(108b) 




W(e> u ) = W'(e> w ) 

S' u {e^) = |W'(e^ w )| 2 \B{e^)\ 2 \l-F(e^)fal 



(108c) 
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All these equations are to be satisfied a.e. on [— 7r,7r]. We have chosen = 1 in (I108ab for simplicity 
and because, as we shall see next, we have enough degrees of freedom to do so without compromising 
rate/distortion performance. Solving the system of equations formed by d!08a| ), d!08c| ) and d 108bb we 
obtain 



\l-F(en\ 2 °l 
U(e iaJ )| 2 



2 S^e^ + lW'iej^rS^) 



\W'(&") 



|2 



S' n (& u ) + \W(& u )fS x (ej u ) 



a.e. on [—ir, vr] 
a.e. on [— tt, vr] 
a.e. on [— it, it] 



(109a) 
(109b) 

(109c) 



It is only left to be shown that there exist causal, stable and minimum-phase transfer functions B{z), 
(1 — F(z)) and A{z) such that their squared magnitudes equal their right-hand sides in d 1 09b . To do so, 
we will make use of the Paley-Wiener theorem (Theorem [8] in the Appendix). 

To begin with, we notice from Fig. |3l and since (u'(A;)} is independent of {x(k)}, that 

1 



/({*(*)} ; {y (*)}) = ~ 2 J M ^ 



In l-F(e JaJ ) a; 



duj, 



(110) 



(HI) 



where (1 1 1 1 b follows from dl09bb . Since Rf(D) is bounded, so is I({x(k)} ;{y'(k)}), and thus we 
conclude from the Paley-Wiener theorem that there exists a stable, causal and minimum-phase transfer 
function (1 — F(z)) satisfying (|109bb - Also, from the fact that the first sample of the impulse response 
of (1 — F(z)) is 1 and as a consequence of (1 — F(z)) being minimum-phase, we conclude that 
/Mn |l - F(e> u )\du = (see, e.g., (T71). Therefore, 



3 2/({x(fc)};{y'(fe)}) 



(112) 



Next, we notice that since W(z) is stable and causal, then there exists a causal, stable and minimum 
phase transfer function W(z) such that W(e^) = \ W(e^)\, forall 10 G [— 7r, tt]. From the Paley-Wiener 
theorem, it follows that 



7T 

/ 



hi 



W(e 



doj < oo, 



which implies that 



— oo < 



In 



W(e juJ ) 



- 

j \d.\W{(P u )\ duj < oo. 



(113) 



(114) 
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On the other hand, from (II 1 II ). 



IZ-CZ?) > - y In ( ^ du 



(115) 



and recalling that ^J^lnS^^doj < oo, it follows that ln(5^(e^)/ | W(e*")| 2 )dw is bounded 
from below. In view of (11141 ). we conclude that ]n(S' u (e' UJ ))dcj > — oo. Now, since ^- J*^ S' n (e^ u )dui < 
D, we can apply Lemma [9] (see Appendix) to obtain that 



7T 

y \lnS f u (e juJ )\du. <oo 



(116) 



Substitution of the RHS of the second equation of (1 108cb into the above, together with the Paley-Wiener 
theorem, yields that there exists a causal, stable and minimum phase transfer function G{z) such that 



\G(e 



\B{f? u )\* \1-F{^)\ 2 ol, 



(117) 



(118) 



and thus B(z) can be chosen to be the causal, stable and minimum-phase transfer function 

B W = - G(Z> . 

W(z)(l - F(z))a n 

which allows us to choose a stable, causal and minimum-phase A(z) = B{z)~ l . Therefore, for every 
e > 0, there exists causal, stable and minimum phase transfer functions A(z), B{z) and 1 — F(z) that 
satisfy (11081 ). attaining equalities throughout and therefore yielding a value of K which satisfies (11071 ). 
This completes the proof. □ 

XV. Proof of Lemma [5] 
Strict convexity exists if and only if the inequality 

A ( /(p 1 ) + [l-A] < /(p 2 )> 4 /(Api + [l-A]p 2 ), VAG(0,1), (119) 
holds for any two pairs p\ = (fi,g\) G F# x G and p 2 = (/2, 52) G x 6 satisfying 

II/1-/2II + II51-32II >0. (120) 
We will first prove the validity of dl 191 ) for pairs p\ and P2 which also satisfy 

\\ gi (co) + [1 - X]g 2 (u)\ >0, Vwe [-7r,7r],VAe [0,1], (121) 
but are otherwise arbitrary. For any given AG [0,1], define the pair 

(/o,5o) = A(A, 5 i) + [l-A](/ 2 , 52 ). 
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V = h-fi] = 92 - 91, (122) 



any pair along the "line" between (/i, <?i) and (/2, 52) can be written in terms of a single scalar parameter 
s via 

(f,g) = (fo + V s , 90 + 0s), 
where s G [A — 1, A]. Define the functions 



M(s) 4 (/, \ g \) = + V s, ^J\g \ 2 + 2K{g e*}s + \6\ 2 s 2 j , (123a) 
V{s) = K — \\ff = K- H/oll 2 - 2(/ , rj)s - H\ 2 s 2 , (123b) 
where lZ{x} denotes the real part of x. Substitution of (1123b into ((4T]) allows one to write the latter as 

j? { f, g ) = J { s)±^^ + L + as + \\6\\ 2 s 2 

where 

a 4 27e{( 5o -G, 0)} 

£ = ||5o|| 2 + ||G|| 2 -2^{( 5 o,G)}. 

We next show that dl 191 ) holds by showing that d 2 J{s) /ds 2 \ s= o > for every A £ [0, 1]. For this purpose, 

we first take the derivative of J(s) with respect to s. Denoting the derivatives of the functions T>{s) and 

N(s) with respect to s by V' and Af', respectively, we have that 

... 2MN'V-N 2 V .. on2 
J (a) = ^ + a + 2\\e\\ 2 s. 

Differentiating again, one arrives to 

T/// . 2 (Af'D - NV) 2 + 2MM"V 2 - N 2 V"V ..„„, 
^ W = — —p 3 + 2||0f 



2 {N'V - NV) 2 + (2NN"V - M 2 V" + 2\\dfV 2 )V 



V 3 

From (11241 ). we have that 



(124) 



2(N 'Vo-NoV' ) 2 2MoM^V -M 2 V' < i + 2\\e\\ 2 V 2 

J ( S )\s=0 = ^3 + ^2 ( 125 ) 
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where 



Wo — A/"(s)| s=0 = (fo, \go\) 



1 9o | 

K = AT( S )jUo = //o, |g|2 'f 0l l" C2 \ + 2(1?, r^> 

\ bo I / bo I (126) 

V ±V(s) ls=0 = K-\\f \\ 2 
D °~ ds 



= -2(70,7?) 

s=0 



D ' 4 = -2\\r,f , 



see (1123b . and where 



c — TZ{g 9*}. (127) 



Notice that Mq and Mq in (11261 ) are well defined since we are considering pairs p\ and p2 for which (1121b 
holds. 

Substitution of (1126b into (1125b yields 

+ 4WnP n (r7, ^) + 2W n 2 ||r7|| 2 + 2\\9\\ 2 V 2 n 



2(A/^ - M,P ) 2 2A/ ^o (/o, J^^jp-) + Wofr, ^) + 2W ( ?||r ? || 2 + 2\\9\\ 2 V 2 



u u 

(a) 2((/o,^)Po + 2AA (/o, ?? )) 2 2/V 2 N| 2 + 4M V ( V , ^) + 2||g|| 2 P 2 

- n + vi 

(±) 2(/ ,Po^{^*} + 2A/- r / ) 2 2AA 2 ||r ? || 2 + WpEWfa y )} + 2 ||fl|| 2 P 2 

~ TP 1 ©2 

^0 ^0 

2 



2(ft{(/o,2/Vo*7 + ^oft0*)}) 2||AT r? + A)fi0*|| 2 



„ 3 + > 0, (128) 

u u 

where (a) and (6) follow from (11261 ). (11271 ) and from the fact that !Z{go9*} < \go\ \9\. The strict inequality 
in ( 11281 ) stems from the fact that \\r]\\ + \\9\\ > 0. The latter follows directly from (11221 ) and dl20| ). 
Therefore (11 191 ) holds for any two pairs p\ = (fi,gi), P2 = (/2,<?2) £ ^if x satisfying (1121b - 

We will show now that (11 19b also holds for pairs p\,P2 which do not satisfy (1121b . The idea is to 
construct another pair, say pf, p\, "close" to p\, p2 and meeting (1121b . and then show that strict convexity 
along the straight line between pf and p 5 2 implies strict convexity along the straight line between p\ and 

P2- 
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For this purpose, define, for any given pairs p\ = (fi,gx) G Fk x G, p 2 = (f 2 ,g 2 ) G Fk X G, the 
family of functions 



hs(uj) 



, if |(ftH| + |s2(w)|=0 

, if Ac/i(w) + [1 - X]g 2 (u) = for some A G (0, 1) 
, in any other case. 



mWn 



where S > is a scalar parameter. The functions h$ defined above exhibit the property (to be exploited 
below) that 

\X[gi(u) + h s {u)] + [1 - X][g 2 (cj) + h s (u)]\ > 0, V gi ,g 2 eG, W > 0, VA G (0, 1). (129) 
Upon introducing the notation p s = p+ (0, hs) and g s = g + h$, it follows directly from (11291 ) that p^ , p\ 



satisfy (11211 ) for a// pairs pi,p 2 G x G. Notice also that 

\\g-g s \\<5. 



(130) 



On the other hand, it is easy to show that J? (p) is uniformly continuous at Api + [1 — X]p 2 for any pairs 
P\iP 2 G Fk x G and for all A G [0, 1]. In view of ( 11301 ), uniform continuity of J! (p) means that, for 
every e > 0, there exists 6 = 5(e) > such that 



f{p & )-f(p) <e, Vp = Xpi + [1 - X]p 2 , VA G (0, 1). 



(131) 



The fact that p\ and p 2 satisfy (11211 ) implies that p\, p 2 also satisfy the strict-convexity condition d 1 19b - 
Therefore, for each AG (0, 1), there exists e 2 (X) > such that 



A^(Pi) + [1 " MS (P2) - f(>*{ + [1 - AP 2 ) > e*W > 0, VA G (0, 1). 



(132) 



Then, from (TT3Tb and (fT32t . 

A/(pi) + [1 - A],/ (pa) > A^(pf) + [1 - A] ^ (pa) - 2e > ^(Ap* + [1 - X]p s 2 ) + e 2 (A) - 2e 

> J?{Xpi + [1 - A]p 2 ) + 62(A) - 3e. 

Since 5 can be chosen arbitrarily small, and in particular, strictly smaller than 8(e 2 (A)/3) > 0, it follows 
that dl 191 ) also holds for all pairs pi, P2 G Fk x G not satisfying (I121I ). This completes the proof. □ 



May 3, 2011 



DRAFT 



44 



XVI. Appendix 

Lemma 6: For any zero-mean Gaussian stationary source {x(&;)} and D > 0, 



R*{D) > Um sup Rf k \D). (133) 

k— s-oo 



Proof: Suppose (1133b does not hold, i.e., that 



V = limsupi?^ (fc) (L>) = Rf(D) + £l , (134) 

fc— S-oo 



for some ei > 0. The definition of R%{D) in (fl37T) means that, Ve2 > 0, there exists y G 5 such that 



limsup/(x fc ;y fc ) < R%(D) + £ 2 (135) 

fc— s-oo 



Combining this inequality with (11341) we arrive to 

V = limsup inf I(x fc ;y fc ) < limsup/(x fe ;y fe ) < £»*(£>) +e 2 (136) 

fc— s-oo fc-s-oo 

Since £2 can be chosen to be arbitrarily small, it can always be chosen so that £2 < £1, which 
contradicts (11341) . Therefore (11331 ) holds. ■ 
Lemma 7: Let 

Rf(D)= inf limsup/(x fc ;y fc ), (137) 

{y(fc)}65 fc _>oo 

where 5 denotes the space of all random processes causally related to {x(fc)}. Let 

I& k \D)± inf I(x fc ;y fc ). (138) 

y k :{y(k)}£S 

Then, for any first-order Gauss-Markov source, the following holds: 

Rc(D) = limsupRi m (D). (139) 

fc— S-oo 



A 



Proof: In Lemma [6] in the Appendix it is shown that 



Rc{D) > limsup^' (fc) (D), (140) 

fc— S-oo 

so all we need to demonstrate is that R l ^{D) < lim sup^^ R l * (D). To do this, we simply observe 
from Theorem 12 that if we construct an output process {y(fc)} by using the recursive algorithm of that 
theorem, with the choice dk = D, for all k € N, then this output process is such that J({x(fc)} ; {y(k)}) 
equals V = lim^oo Rc (D). Therefore, R l *(D) < V, concluding the proof. ■ 
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Proposition 1 (MMSE Column Correspondence): Let x £ 1* be a Gaussian random vector source 
with co variance matrix K x . A reconstruction Gaussian random vector y satisfies 

E[x k \yl]=y k (141) 

if and only if 

K y e k>k = K yx e fcifc . (142) 

A 

Proof: We have that 

K yx e k:k - K y e k , k = E[x k y k ] -E[y k y k ] = E[(x& — y k )y k ] (143) 

The proof is completed by noting that E[x k |y^] = y k if and only if E[(x& — y k )y k ] = 0. ■ 
Lemma 8 (MMSE Triangular Correspondence): Let x G M. , with N G N, be a Gaussian random 
source vector with covariance matrix K x . A reconstruction Gaussian random vector y satisfies 

E[x fc |y£] =y k , Vk = l,2,...N (144) 

if and only if 

[K y ] jtk = [K yx ] j!k , Vj < k, j, k = 1, 2, . . . AT. (145) 

A 

Proof: Let us first introduce the notation iVf fej £ M fcxfc , denoting the top-left submatrix of any given 
square matrix M £ M ArxiV , with N > k. From Proposition [Q it immediately follows that, for every 
fc = 1,2,...JV, 

K^e kik = K y ie k ^ k = K y i x ie fcjfc = K^e^, (146) 

which is equivalent to d 145b - ■ 
Lemma [8] implies that, if the reconstruction y is the output of a causal Wiener filter applied to the noisy 
source x + n for some noise vector n (a condition equivalent to (11441 )). then K y and K yx have identical 
entries on and above their main diagonals. 
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Paley-Wiener Theorem: 

Theorem 8 (From KT2\ p. 229] ): Let g(e ja; ) be a non-negative function denned on (— 7r,7r]. There 

i 1 2 

exists a unique stable, causal and minimum phase transfer function Y[z) such that Y^e 5 ") = g(e JU} ) 
if and only i^] 

j |log(s(e**'))| dw < oo (147) 

— TT 

A 

Lemma 9: If /(w) > OVw € [— 7r, 7r] and is such that f(u))dw < oo and In f(io)dco > — oo, 
then 

\ln f (lo)\ dco < oo (148) 

A 

Proof: Let 5 = {w£ [— tt, 7r] : f(uj) > 1}. From Jensen's inequality and the fact that f(uj)duj < 
oo , we have 

J hi f(u>)(ku < |5|ln J f(w)du\ <oo. (149) 

This, together with the condition In f(oj)dui > — oo, implies that 

- J \nf(tj)du) < oo (150) 

Therefore, 

TT 

j \\nf{u)\du = - j hif(u)djjj+ j In f(uj)duj <oo, (151) 
completing the proof. ■ 
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